Assurance of Al-enabled systems - Preprint 


Odd Ivar Haugen 
email: odd.ivar.haugen@dnv.com! 


'DNV, Group Research & Technology, Trondheim, Norway 


February 8, 2024 


1 Abstract 


Society increasingly relies on systems whose behaviour is (in part) determined by artificial intelligence (AT). 
The value these systems bring depends on the difference between the positive and negative contributions they 
deliver. To understand and predict the system behaviour in order to maximise its value, that is, securing the 
positive contribution and avoiding the negative, knowledge about its behaviour is key. Assurance generates 
this knowledge, enabling stakeholders to make informed decisions concerning the development and use of 
such systems. 

ALenabled systems interact with both humans and other technical systems creating intricate interde- 
pendencies. Such systems are termed complex systems whose behaviour depends, not on the properties of 
each constituent, but on the interaction between them. Knowledge about the behaviour of such systems, 
and thereby the assurance, needs to take a systems approach capable of revealing these interactions. 

Risk describes the degree and uncertainty about the value AI systems bring to stakeholders. The tradi- 
tional understanding of risk, that is, a combination of consequence and likelihood, is inadequate to capture 
the essence of risk in such systems. Therefore, a more suitable understanding of the risk concept is adopted 
which expands the likelihood to also capture epistemic uncertainty. 

As AI represent a novel technology, knowledge about its properties is scarce. Therefore, an evidence- 
based argumentation is adopted so that the knowledge can be treated systematically and be transparent so 
it can be scrutinised. 


2 Introduction 


A prerequisite for AI fulfilling its value proposition is that society has grounds for justified confidence, that is, 
assurance that these systems are trustworthy and that they are developed, used and managed responsibly. To 
enable the demonstration of these traits, DNV has published a new Recommended Practice (RP): “Assurance 
of Al-enabled systems”. 

Four principal criteria must be in place in this demonstration: 


1. the system requirements must reflect the interest of society, 


2. refining these requirements into technical specifications must maintain the essence of these require- 
ments, 


3. the system’s adherence to these requirements must be secured and adequately substantiated, 


4. 1, 2, and 3 must be communicated to the stakeholders or their representatives in such a way that they 
can make intelligible decisions. 


The objective of this paper is to discuss the theoretical foundation behind the content of the newly 
published Recommended Practice (RP) Assurance of Al-enabled systems [11]. 

Governments and legislative authorities are developing legislation and regulations for the use of AI, 
such as [9], and standardisation institutions like ISO are developing new international standards addressing 
technical aspects of AT [1]. 

There is a race among actors, not only technology and system providers and companies that want to 
utilise the competitive advantages that AI can bring, but also among countries to position themselves at the 
forefront of this development. Even among legislators, there is a race to facilitate a more widespread use of 
AI. Apart from being at the forefront of bringing AI into society, these initiatives aim to avoid or limit the 
negative effect AI may have while maximising its value. However, standard utility theory states that there 
is a sweet-spot where expected value and risk are balanced. Therefore, to maximise the value, it is vital to 
reduce the uncertainty. 

Although the term Artificial Intelligence was coined in 1956, large-scale deployment of AI into society is 
relatively new. “Everybody”, that is, developers, regulators and legislators, users, and the society at large, 
therefore lacks experience and knowledge with this technology and its effect on society. Society requires 
assurance that the risk is acceptable. However, determining risk requires predicting the consequences on 
society, which means predicting the behaviour of AJ-enabled systems. 

Most Al-enabled systems are inherently complex; that is, they show emergent behaviour. This invalidates, 
to a large extent, a reductionist approach to understanding and predicting the behaviour. Traditional meth- 
ods are mostly based on such a reductionist approach, which means that we need alternative methodology 
in the analysis to understand the behaviour of these systems. 

The complexity, the novelty of applications, and the fundamental technological shift invalidate, to a large 
extent, the use of historical data in predicting the future (frequentist approach). Risk has been understood 
as some combination of the consequence of a loss-event and its probability of occurrence. Estimating the 
probability of an event and its consequence when unknown factors determine the event may not adequately 
express the inherent epistemic uncertainty. Abandoning a unilateral frequentist approach challenges the tra- 
ditional understanding of risk and its representation. A broader approach to risk, not limited to probabilities, 
is needed to properly capture uncertainty. 

A commonly used definition of knowledge is: “Justified true belief”. Some philosophers have abandoned 
this definition because of several problems in the relation between the three terms in the definition: Ever 
strong justification backing a belief cannot guarantee truth; An assertion without justification may turn out 
to be true; An honest belief backed up by (seemingly adequate justification) is true; however, the justification 
turned out to be invalid [45]. 

Still, we need to explicitly represent what we (think we) know, whether it turns out to be the truth or 
not, that addresses recognised elements that constitute knowledge. As AI represents unknown territory in 
many aspects, we need to treat new knowledge systematically and be transparent so it can be scrutinised and 
properly communicated to the stakeholders. Unsubstantiated and sometimes even manipulative propositions 


about the capabilities of AI may lead to misunderstandings, degrade the potential value, and increase the 
risk to stakeholders and society. 

Society is not a uniform group of people with common interests; society consists of various groups in 
different roles with diverse interests, often conflicting. Some of these are short-term, and others are long-term 
goals. Consequently, the system specifications may become a trade-off between these diverse and sometimes 
fluctuating interests. Even the same stakeholders may hold short-term that conflict with their long-term 
interests. Moreover, as AI systems evolve with society, stakeholders may modify their initial expressed 
interests. A trade-off may result; that is, legitimate interests may have to, at least partly, be disregarded. 
The alignment between the specifications and the different interests may become challenging, so guidance is 
necessary. 

Traditional software development follows well-proven and standardised processes. The behaviour of AI 
systems is not necessarily programmed, as traditional software systems, but learned from data. Therefore, 
traditional development processes become inadequate because they do not address the concept of learned 
behaviour from data. Although traditional development processes may differ between industries, applications 
and risk levels, standards exist describing good practice, such as [3]. New AI development processes are 
being developed and standardised, which, naturally, do not have the same track record as the traditional 
processes. Moreover, AI systems may change and optimise their behaviour in response to usage while in 
operation. These fundamental differences require close monitoring in the entire AI system lifecycle, from 
concept to operation and retirement. 

A well-designed and well-known work process increases the likelihood of adequate output quality. Other 
aspects that will affect the output quality are the competence of the people performing the tasks and activities 
and the maturity of the organisation responsible for implementing the process. Developing AI systems cannot 
be said to possess any of these properties; the AI lifecycle process is not well-established or matured and 
optimised; AT is a novel technology, so the competence among practitioners may be inadequate; the maturity 
of the organisations may also be inadequate. Therefore, it is a need, to not only monitor the lifecycle process, 
but also iterate the monitoring and control of each step within the process. 


3 The Systems Approach to handle system complexity and emer- 
gence 


Al-enabled systems consist of components and agents (human and artificial) that interact and perform a 
series of interdependent actions to achieve goals in different environments that, themselves, are systems 
with non-trivial interacting components. The system properties and behaviour cannot be understood by 
investigating single components inside the system. Due to the interaction of interdependencies between 
components and agents, and between the system and its environment, the system properties and behaviour 
are emergent. 

Such properties do not exist in each component, but emerge due to their interactions. By reducing the 
system into its components, the properties are lost, and therefore become unobservable. Such properties can 
be said to be computationally irreducible [56]. 

The growth/decline of macroeconomics and the stock market, the social life of army ants, the wetness 
of a raindrop, human culture, the global climate, a city’s resilience against a catastrophe, and system safety 
are all examples of emergent behaviour or properties [34, 40, 38]. 


3.1 Complexity - emergence and the CESM metamodel 


Complexity and emergent behaviour, or emergent properties, are closely related; hence, the science of emer- 
gence is really about complexity [53, 34]. There is no single and all-encompassing definition of either com- 
plexity or emergence. In the same way, a universal understanding of how to measure them does not exist 
among either scientists or philosophers [40]. One reason for the lack of a definition is that complexity can 
come in many forms, such as [40, 38, 35]: 


e Size, 


e level of entropy, 


e logical and functional depth, 


level and amount of interaction and interdependencies among system entities, 


non-linear causes and effects, 


e feedback loops, 


number of system states, 


intricate transition rules between states. 


The forms of complexity indicate intractability, non-trivial ways of understanding, explaining and pre- 
dicting the behaviour of a complex system. 

However, it is important to notice that complexity and emergence are properties of the system, not of 
epistemology [14]. Explained emergence (and complexity) is still emergence [16]; that is, a system does not 
cease to be complex just because we understand (to a certain degree) its behaviour. 

A way to understand and analyse complex systems and emergence is to model the system behaviour in 
terms of its composition, structure, mechanisms and the environment in which it operates. These system 
aspects are termed the CESM metamodel [16]: 


e Composition (C): Collection of all the parts or objects in the system. 


e Environment (E): Systems outside (excluded from) the target system, but act upon, or are acted 
upon by, the target system. 


e Structure (S): The relationships and bonds among the system agents and between the system agents 
and the environment. 


e Mechanisms (M): The processes that make the system behave in the way that it does. 


The emergent behaviour becomes a function of the above elements; that is, any system s may be modelled, 
at any given instance, as the quadruple: p(s) = <C(s), E(s), S(s), M(s)>. 
Complexity can be understood in the context of the same [29]: 


e Composition: Number of system objects, parts and elements. Size of composition hierarchies. 


e Environment: Size of state space, number of agents and their autonomy, (lack of) rules of interaction 
with the system. 


e Structure: The stability of the relationship, responsibility and authority between the system agents, 
and between the system agents and the environment. The degree of cooperation needed to achieve a 
goal. 


e Mechanisms: Number of functions, what agent can/must perform them, needed resources, number 
of preconditions, possible postconditions, and the control of their execution. 


In short, emergent properties result from the conceptual interaction between the elements in the CESM 
metamodel, and complexity can be thought of by how intricate these interactions are. 

To investigate the nature of such interactions, the bonds, roles, and responsibilities of agents (the Struc- 
ture), how they interact (the Mechanisms), and the properties of the system components (the Composition) 
must be analysed and synthesised. 

The above pseudo-definition of emergence and complexity does not entirely describe what these concepts 
entail; however, it is helpful when developing a framework for understanding and analysing complex systems. 


3.2 Levelism 


What constitutes a system depends on the observer’s point of view [55]. For two different observers, the 
same entity may be seen as a system with interacting components, and for another, it can be seen as a 
(single) component within a larger system!. 

System behaviour can be analysed (explained) at different LoAs, depending on the observer’s viewpoint; 
that is, depending on the knowledge we seek [23]. Hence, interactions and dependencies must also be 
explained at different LoAs. This means that (abstract) system constituents (items, agents and actions) 
must be identified at different LoAs [30]. 

An analysis at one LoA is not ‘better’ than at another; they are just different because they provide 
different kinds of knowledge about the system. The search for knowledge in the current context, driven by 
the objective of the analysis, guides our choice for LoAs, that is, epistemic levelism. 

We may divide levelism (LoA) into epistemic and ontological. Epistemic levelism addresses the kind of 
knowledge that we seek; ontological levelism is how (we choose to) divide the system into levels of detail. 
The two kinds of levelism are often closely related; that is, how the system is divided into levels is often 
related to what kind of knowledge we seek. 


3.3 System models 


Models representing the system are abstractions of constituents and their relationships and bonds. The 
entities within the system models are also abstractions. The entities included in a system model at certain 
LoAs may not exist in the actual system or even be planned to exist. The names of the system model entities 
may indicate their function, role, type, or other features. 

More than a single system model is needed to address p(s). As the conceptual interaction between the 
elements of the CESM metamodel is both necessary and sufficient to describe any system behaviour, the 
collection of system models must address every element of the CESM metamodel at the LoAs (epistemic 
and ontological) needed to gain adequate knowledge [33]. Moreover, they must also be connected so that 
the emergent system behaviour, j4(s), becomes observable. 

For each element in the CESM metamodel, we can assign different model categories. Moreover, the model 
categories must be connected to elicit p(s). 

The following model categories represent the CESM metamodel: 


e Composition: Object model representing the system elements and components and their ontological 
relationship to each other. 


e Environment: Also modelled as a system containing all aspects of the CESM metamodel, which means 
that the environment must be represented by models representing the composition, structure and 
mechanisms (our target system is part of the environment of its environment). 


e Structure: Agent model includes entities such as controllers, actuators, sensors, humans, and AI 
subsystems. The agent concept includes authority, responsibility, goals, concerns, motivation, and 
wishes (humans). 


e Mechanisms: Function model represents the operations that must be performed (by the agents) to 
achieve goals. 


A specific system model is an instantiation of the above categories. A control structure including a 
controller, control actions”, feedback, and a controlled process known from Systems-Theoretic Process Anal- 
ysis (STPA) [38] is one instance of an agent model. Another agent. model may focus more on the agent’s 
goals, motivation, concerns and wishes, like a model used in a stakeholder analysis where social and business 
aspects are emphasised. 

A function model may focus on the preconditions, resources, and timing for achieving it, like the model 
used in Functional Resonance Analysis Method (FRAM) [34]. Or, it may focus on functional dependencies 
to other functions, like in Functional Analysis System Technique (FAST) [17]. 


1Sometimes in the literature the term “system of systems” is used. This term is superfluous because all systems are in fact 
system of systems so the term does no bring any further insight. 
?An action is a function performed by an agent. 


The different models give different views of the same system, which means that the models should be 
consistent. Every model has qualities the others lack; however, they need points of contact to ensure their 
consistency; they need to ’borrow‘ some aspects from each other [33]. The models should be distinguished, 
not detached or isolated. On top of these borrowed aspects, consistency rules regulate their relationship. 

These relationships and rules increase rigour (formalism) in revealing the system behaviour. This is 
important for the objectivity, the transparency, and thereby the trustworthiness in any context in which 
these models are used. 


4 The ontology of assurance 


Confidence can be thought of, in statistical terms, as a quantitative measurement of uncertainty, e.g. an 
interval indicating the confidence that the value of a parameter is likely to fall within. However, confidence 
may also be thought of as a feeling that reflects the coherence of the information and the cognitive ease of 
processing it [37]. Assurance is defined as “grounds for justified confidence that a claim has been or will be 
achieved” [6]. The definition does not limit assurance to one or the other type of confidence; hence, assurance 
addresses both. 

Both types of uncertainties pose challenges. The frequentist approach to quantifying uncertainty requires 
robust statistical data. Here lies a few major obstacles, some of which are: the inherent complexity of 
AI systems, the novelty of the technology, statistically significant data from rare events, and assigning 
probabilities to inherently social aspect of AI systems. 

The second type of confidence also poses challenges. We cannot base decisions concerning the well-being 
of stakeholders and society on pure feelings but on strong knowledge based on facts and trustworthy evidence. 

Therefore, assurance may provide grounds for justified confidence through uncertainty quantification 
only if based on robust statistics, that is, knowledge about properties of the statistical distribution of the 
parameter in question, and/or judgemental assessments only if based on sound argument substantiating the 
truthfulness of the claim. 

Therefore, the immediate goal, or primary effect of assurance, is to generate knowledge, knowledge to 
decrease or establish the uncertainty about a claim, addressing both types of uncertainty when appropriate. 
A Functional Analysis System Technique diagram (FAST diagram) illustrates the relation between assurance, 
knowledge and confidence (Figure 1). 


How —> <—Why 
increase/establish decrease generate perform 
confidence uncertainty knowledge assuracne 


Figure 1: FAST diagram connecting knowledge to confidence 


As knowledge is the “hub” of assurance, knowledge must be treated systematically and expressed explicitly 
to enable it to be rigorously scrutinised. This is to avoid that confidence being based on unsubstantiated 
feelings and pure guesswork. The assurance case is a systematic and explicit way of representing and treating 
knowledge (see ’Assurance case - a systematic way to represent knowledge’). 

As AI systems are inherently complex and display emergent behaviour, the knowledge about the truthful- 
ness of the claim must address all elements that affect emergence. Elements necessary in analysing emergent 
behaviour in engineered socio-technical systems are encapsulated in the systems approach (see ’The systems 
approach to handle system complexity and emergence’). 

Intuitively, the higher the risk that the AI system poses to stakeholders, the higher confidence we need 
that it will indeed behave as expected. As knowledge decreases uncertainty and increases confidence, we 
need a way to assess the strength of knowledge. The strength of knowledge is measured in ’Objectivity - a 
metric of strength of knowledge’. Assessing the strength of knowledge is key to adjust the assurance effort 
to risk level. Figure 2 depicts how the different items discussed above are connected. 


DETERMINE 


ESSENCE OF 


:SUBJECT MATTER 


REDUCES 


ANALYSES 


:METRIC OF 
STRENGTH OF 


Figure 2: The ontology of assurance 


5 Ethics - respecting stakeholders 


Stakeholders hold objectives and pursue goals through utilising the AI system; that is, they use the AI 
system for a reason. A system’s mission is expressed as system requirements are elicited and understood 
from these objectives. 

The stakeholders may be users, developers, and bystanders who have nothing to gain from the system 
but may be affected by it. Through its legislation and standards, the government represents stakeholders 
that cannot be consulted directly, such as the natural environment, future generations, the general public, 
children, etc. In such cases, conformance to standards means meeting stakeholders’ objectives and interests. 

Stakeholders need confidence that their objectives are, or will be, fulfilled or that a deviation from those 
objectives is acceptable. Implicitly, stakeholders also hold objectives of being safe, secure, treated fairly and 
more. These objectives may not be directly linked to the reason for developing and using the AI system in 
the first place. The system requirements must incorporate such implicit objectives. These kinds of system 
requirements can be termed mission-supporting requirements, or non-functional requirements [51], or even 
system constraints? [38], 

In the context of AI assurance, mission-supporting system requirements should be based on a set of 
ethical principles such as: [5, 25, 42]. 

These kinds of requirements can further be categorised into technical and socio-technical aspects. Techni- 
cal requirements address the technical properties of the system, which typically relate to physical properties. 
Adherence to such requirements may be determined by studying the system’s constituents in isolation and 
then aggregating their properties. These properties can be termed resultant properties because there is a 
linear relationship between the properties of the constituents and the properties of the system. Analysing 
such properties may be based on a reductionist approach. Examples of these types of requirements are 
reliability and robustness. 

Socio-technical requirements address aspects that emerge as social and technical components interact. 
We may describe these kinds of requirements within a continuum of social influence, where safety and privacy 
are examples at the lower end, and fairness and transparency at the higher. Here, social influence describes 
the level of personal preference and cultural background affecting judgements and decisions in expressing 


3Prof. Nancy Leveson terms this “safety constraints”, however, expanding the scope of such requirements to other system 
quality characteristics they can be termed as “system constraints”. 


preferences, eliciting requirements from these preferences, and finally refining high-level requirements into 
lower-level detailed requirements. 

There is no hard border between technical and socio-technical properties, so we may expand this contin- 
uum to encapsulate the technical requirements and properties. 

The two categories indicate to what degree it is possible to quantify the particular aspect (Figure 3). 


Social influence 


Technical Socio-technical 
(resultant) (emergent) 


Figure 3: System requirements quantifiability 


The amount and nature of assumptions needed to quantify a socio-technical requirement are greater 
than those related to technical requirements. As the amount and the nature of assumptions increases, 
the uncertainty increases whether the quantification actually represents the essence of the original social 
requirement, or even if it can be quantified [49]. 

Conflicts often arise between requirements directly related to the mission of the system and the mission- 
supporting requirements. Moreover, similar conflicts may also arise between the objectives and goals of 
different stakeholders, and even between different objectives of the same stakeholder (e.g. long-term vs. short- 
term goals). One understanding of ethics is: “the identification, study, and resolution or mitigation of conflicts 
among competing values or goals” [39]. The assurance effort should facilitate and document sound trade-offs 
to resolve these conflicts between competing goals. 


6 The uncertainty-based risk concept 


As a unilateral frequentist approach to risk is inadequate due to the complexity and novelty of AI systems 
and the novelty of applications, we need a broader approach to risk. [4] defines risk as: “Effect of uncertainty 
on objectives”. Here, probability is represented by the more generic concept of uncertainty. 

As the definition is rather abstract, we need to operationalise it to make it more usable. 


6.1 The ontology of risk 


Risk can be rephrased as a concern about deviation from objectives; that is, objectives are not achieved or 
only partly achieved. One such objective can be that an activity is performed safely. A deviation from such 
an objective would mean the activity is unsafe and could lead to an accident. Hence, a deviation from the 
objective may be a hazardous event or hazardous system state. 

Uncertainty must be understood as a property of a deviation from objectives [32]. There are two issues 
connected to deviation from the objective: 1) The uncertainty of whether the deviation will occur, and 2) 
The uncertainty about the consequence should the deviation occur (Figure 4). 

Following the line of thought above, it is uncertain whether a hazardous event will occur or not, and the 
consequence is uncertain should it occur. 

Uncertainty may be split into two forms: 1) epistemic and 2) aleatory [26]. Epistemic entails uncertainty 
due to lack of knowledge, and aleatory entails uncertainty due to the variability of a stochastic process, 
often expressed as a probability distribution. Therefore, the ontology of risk can be illustrated as depicted 
in Figure 5. 

This uncertainty-based risk concept, therefore, captures the traditional understanding of risk, that is, 
some combination of consequence and the probability of it occurring. In addition, by making epistemic 
uncertainty explicit, we better capture the challenges of novel technology. 


_--" PART OF aaa 


“ PROPERTY ac 
/ consequence 
Fi deviation from 
objective 
PROPERTY 


Bs PROPERTY 


~~... PART OF ae 


Figure 4: Figure 4 The concept of risk 


aleatory 


FORM OF 


PART OF PART OF 
consequence risk uncertainty 


FORM OF 


Figure 5: Figure 5 The ontology of risk 


6.2 Epistemic and aleatory uncertainty 


Knowledge reduces epistemic uncertainty. (In Figure 1 and Figure 2, we simplified this relationship by not 
differentiating between the two forms of uncertainty.) However, generating knowledge about the mechanisms 
forming the probabilities of interest decreases the epistemic uncertainty concerning the aleatory uncertainty. 
In this case, the epistemic uncertainty may be termed a meta-uncertainty, that is, epistemic uncertainty 
about aleatory uncertainty. Therefore, knowledge is key and can be considered the hub of assurance. 

As uncertainty is one part of risk, decreasing the uncertainty reduces risk. Risk is reduced either by 
reducing epistemic and/or aleatory uncertainty. The two forms of uncertainty may be thought of as belonging 
to two different domains - knowledge domain (epistemic) and real-world domain (aleatory) (Figure 6). As 
both forms of uncertainty are part of risk, and risk also captures the consequence, risk encapsulates both 
domains. 

By changing the system design, the aleatory uncertainty about a deviation from objective changes, that 
is, the probability of an accident*. This change may be the probability that an accident occurs or the 
probability that, e.g., lives are lost (consequence) should an accident occur, and the kind of accidents that 
can occur. Changing the aleatory uncertainty is part of risk management and system design. 

The risk management effort needs knowledge to make good decisions. This knowledge is provided through 
assurance. 


4 Assurance should in general not limit the scope to accidents as a source of deviation from objective, long-term usage may 
result in such deviation not linked to a single event such as an accident. 


change system 
design 


\LTER —p v. A 
\ 
real world domain \ | 


knowledge domain META UNCER LAINE INFORMATION 


7 EXCHANGE 
—_ a“ : 
DECREASE 


v 


<r SPONSIBLE FOR assurance 


risk management/ 
system design 


RESPONSIBLE FOR 


strength of 
knowledge 


Figure 6: Real world and knowledge domains 


6.3 Objectivity - a metric of strength of knowledge 


By generating knowledge about the system, the epistemic uncertainty about deviation from objective changes, 
that is, knowledge about how an accident may occur or the potential consequence should it occur. High risk 
means severe potential consequences combined with a large degree of uncertainty (epistemic and /or aleatory). 
As knowledge decreases uncertainty, high risk requires strong knowledge, that is, knowledge substantiated 
with strong grounds for justification. 

Justification, and thereby knowledge, is, among other things, based on artefacts representing the system 
and its properties, together with how these artefacts are interpreted, that is, the reasoning used to conclude 
based on these artefacts. Artefacts, such as training data, algorithms, source code, and system descriptions, 
may represent the system directly. Other kinds of artefacts may indirectly represent it, e.g., artefacts 
generated through verification, such as test cases, test results and results from inspections and reviews. The 
strength of knowledge is directly linked to these artefacts and the process of generating and collecting them. 

Distinguishing weak from strong knowledge requires a metric by which the strength of knowledge can 
be assessed. By comparing the definitions of knowledge and assurance, we recognise the similarities. Both 
definitions contain the term “justified”: The degree of justification for a true belief (knowledge) - the grounds 
for justified confidence (assurance). Degree of justification is central in assessing both strength of knowledge 
and degree of confidence (via uncertainty as shown in Figure 2). A high degree of confidence requires strong 
ground for justification. 

The degree of objectivity encapsulates the aspects important for assessing the degree of justification, 
that is, the strength of knowledge. Hence, the strength of knowledge is measured through the degree of 
objectivity. The likelihood that the result of an enquiry represents the truth increases if it is conducted in 
an objective manner, including the artefacts produced and used in that enquiry. 

Ensuring consistency and repeatability in our enquiries requires that the concept of objectivity is de- 
scribed. Objectivity in this context is a multi-dimensional, non-orthogonal and non-binary concept [21]. 
Hence, objectivity cannot be treated based on a reductionist approach. 

There are three categories (i.e. dimensions) that lay out the space of objectivity [21, 33]: 


1. properties and processes by which the artefacts are generated 
2. reasoning, or the thinking about those artefacts 


3. social processes concerning items 1 and 2. 


Item 1 is about interacting with the system and its stakeholders during its entire lifecycle. It is about the 
choice of methods, how they are applied, and how those decisions influence the properties of the outcomes, 
that is, the artefacts. This category also includes procedures, methods, techniques, first principles in physics, 
standardised equations, algorithms, etc. 


10 


Item 2, this category is about how people and organisations think and the reasons and positions they 
take based on their interests and roles. 

Item 3 is about the social processes that advocate different viewpoints, such as agreement among subject- 
matter experts about the suitability and the correct use of methods used to generate artefacts and how to 
think about those artefacts. This kind of objectivity can be thought of as a form of inter-subjectivity and 
is strengthened if the group consists of individuals with different but relevant competence. The content of 
standards is a result of such agreements. 

An important activity in assurance is the generation and collection of evidence through verification and 
validation (V&V). V&V is described through two properties: 1) The level of intensity in the V&V effort, and 
2) the level of rigour in the V&V effort [31]. V&V intensity is connected to the size of the scope, the number 
of system artefacts investigated, and the level of V&V involvement in each phase of the system lifecycle. V&V 
rigour is connected to comprehensiveness and thoroughness, leaving less room for logical inconsistencies and 
contradictions in the results, that is, performed with different levels of formality concerning techniques and 
documentation. One useful metaphor describing the relationship and difference between the two properties 
may be that increased V&V intensity makes the mesh width smaller and smaller while increasing the V&V 
rigour means that each mesh is investigated closer and closer. 

The output from the V&V effort is the evidence representing the system properties of interest, such as 
safety, reliability, robustness and security. V&V intensity and rigour affect the evidence properties such as 
quality, capability, and coverage. 

The assessment of the strength of knowledge through the degree of objectivity and further through the 
V&V intensity and rigour, and the resulting evidence properties cannot be a simple checklist or categorised 
as numerical scores and then aggregated as a simple sum. The strength of knowledge must be assessed in 
each particular project in the context of a totality. That is, the strength (of knowledge) is not a resultant 
property of the degree of objectivity (and V&V), but emergent. Assessing the strength of knowledge depends 
on the judgement of experts in the relevant disciplines. 


7 Assurance and system risk 
Figures 1 and 2 in ’The ontology of assurance’ showed how knowledge generated in the assurance effort 
reduces uncertainty, and that uncertainty determines confidence. Moreover, Figure 5 in ’The ontology of 


risk’ showed that uncertainty is one part of the risk concept. Hence, assurance and risk are connected 
through uncertainty (Figure 7). 


-PART OF . aa 
-PART OF REDUCES \ i 


GENERATES 


DETERMINE 


:GROUNDS FOR JUSTIFIED 


Figure 7: Assurance is connected to risk through uncertainty 


There is, however, another connection in addition to the one mentioned above. In the top left corner 
of Figure 2, it is indicated that the subject matter of the knowledge is the claims. Claims are statements 


11 


about system properties that address the system requirements elicited by stakeholders and their concerns 
and objectives (Figure 8). 


—AVERT—> 


:PART OF 


CONCERN ABOUT | 


:PART OF 


KIND OF 


REDUCES 
ELICITED TO AVOID CLOSES [USS = 


\ y 
u oo s 
- 
TRANSFERRED IN to> 


SUBJECT MATTER 


Figure 8: Assurance is connected to risk through claims 


Stakeholders are generally risk avert [37] and are concerned about the consequences of losses. They need 
adequate confidence that potential losses are acceptable. Assurance addresses these concerns by generating 
knowledge about the truthfulness of the claims made about the system properties. 


8 Requirement characteristics 


Suppose the system boundary interacts with society (e.g. a self-driving autonomous vehicle). In that case, 
the stakeholder objectives (left in Figure 8) tend to be located from the middle to the right in Figure 
3, which results in more social-oriented top-level requirements (left in Figure 9). However, suppose the 
system boundary interacts with a larger technical system (e.g. an object detection system within a situation 
awareness system). In that case, the top-level requirements may be more technical-oriented and, therefore, 
more straightforward to quantify. 

In case the top-level requirements are social-oriented, there are two extremes: 1) the top-level requirement 
is concretised and quantified and further refinement is a matter of statistical and mathematical deduction 
based on probabilistic models or first principles of physics, or 2) the top-level requirements is never quantified 
and remains qualitative where further refinement is a matter of deduction based on system behavioural 
models and ethics. In both cases, the deduction must be validated to ensure that the refined requirements 
adequately represent the higher-level requirements. 

Between these two extremes, the top-level requirement may be qualitative, refined to a certain LoA and 
then quantified. This LoA may be at the AI subsystem’s boundary, or any other level. Note that if the LoA 
of quantification is at the highest LoA, we have extreme 1) above. 

The top-level requirements may also be split into quantifiable parts and qualitative parts, which is just 
a combination of the two extremes described above. The quantitative and qualitative parts may also be 
swapped with proxies . The proxy (or surrogate) property is selected to represent the system property 
specified by the top-level requirement (e.g. reliability acts as a proxy for safety). The motivation for using 
a proxy is often that it can be easier to analyse or quantify; that is, the proxy property is located further to 
the technical side in Figure 3. However, caution must be applied using proxy properties because they may 
not adequately reflect the original property [30]. 

The argument strategy depends heavily on the nature of the requirements, such as technical vs. social, 
the risk level, the novelty of the technology and the choices made in refining them as described above. 


In either case, quantitative, qualitative or a mix between the two, refining and concretisation is based on 
system models, which could be, behavioural probabilistic, logical, or first principles of physics. 

The assessment of the argumentation is also influenced by the same aspects: nature of the requirements, 
risk and so on. If the requirement is quantified, the assessment of its fulfilment may be trivial. However, 
assessing the validity of the assumptions made when quantifying social-oriented requirements needs to be 
adequately scrutinised in the same way as in case the requirements remain qualitative. 

If the requirements remain qualitative, the assessment of its fulfilment needs to be judgemental based on 
degree of belief (see ’Assessing arguments’ for a further discussion on assessment in the context of episte- 
mology). 


9 Two kinds of risk - system risk and assurance risk 


In connection with assurance, risk is divided into two kinds kinds: 1) System risk, and 2) Assurance risk. 
Risk is always connected to a loss, that is, a consequence (with associated uncertainty). 

System risk is what we normally think of when considering risk, that is, a possible (undesired) consequence 
of operating a system, e.g., to people, an asset or the environment. 

The risk associated with assurance can be termed assurance risk®. The knowledge generated in assurance 
is used to make decisions. Therefore, the loss in this case is associated with making the wrong decision. A 
well-known example is rejecting a drug from going into the market that is safe and could save many lives, 
but is rejected. The rejection, in association with assurance, could be due to a lack of, or weak, knowledge. 
Moreover, demanding too strong knowledge or inaccurate/false information leading to a belief revision not 
representing the truth are other causes of rejecting the drug. 

These questions are presently discussed in the AI community and in society at large about e.g. how strong 
knowledge is needed to allow autonomous vehicles into the traffic among other vehicles and humans; on the 
one hand, they may cause accidents that human drivers would not cause; on the other, they may save many 
lives by avoiding accidents that humans sometimes causes. The latter risk is connected to opportunity-loss® , 
or underuse of AT [24]. 

Assurance risk is a type of epistemic risk [15]. In assurance, epistemic risk can be defined as the risk of 
being wrong about claims concerning system properties. One subgroup’ of epistemic risk is termed inductive 
risk [20], that is, making a wrong decision in the “inductive inference from evidence to hypothesis acceptance 
or rejection” [15]. Inductive risk is thus connected to uncertainty propagation discussed later in this paper. 

In assurance, objectivity serves a dual purpose: ensuring that: 1) the system properties are as expected, 
and 2) reducing the epistemic risk (i.e. adequate strength of knowledge). 

Thinking in terms of assurance risk opens the door to analysing the assurance as a system, that is, 
thinking of the assurance as the target system of assurance. How can assurance fail to reach a conclusion 
that adequately represents the truth? However outside the scope of this paper, assuring the assurance is a 
kind of meta-assurance and can be performed using the same assurance principles as described in [11] and 
in this document. 


10 Epistemology and justification 


Although knowledge is not easily defined, it must be linked to accessible facts about the subject matter. 
Moreover, building confidence through knowledge requires, not only apparently truthful propositions, but 


’ This leads to two different kinds of requirements: 1) System requirements (about the system properties and its use), and 2) 
assurance requirements (properties of the assurance effort). [11] contains generic assurance requirements. Identifying relevant 
system requirements is addressed by the assurance requirements in [11]. Moreover, specific assurance requirements may also be 
found in domain-specific standards and guidelines, such as requirements about independent verification and validation (IV&V), 
independent assessment, specific kinds of evidence and documentation and so on. Assurance requirements are sometimes referred 
to as assurance process requirements; however, although assurance requirements may impose requirements on the assurance 
process, the process is just one of many elements in [11], and there are several other elements that need to be addressed in a 
generic assurance effort. 

6System risk does not address the risk of not accepting an AI system, however, assurance risk will. 

[15] discusses seven different subgroups of epistemic risk. Although many are highly relevant for assurance, the details in 
connection with assurance will have to be discussed in a later paper. The takeaway are that they are handled through the 
concept of objectivity discussed earlier in this paper. 


13 


also that the reasoning is sound, relevant and adequate; that the proposition is justified: “Someone who is 
very confident but for the wrong reasons would also fail to have knowledge” [41]. The reason for believing 
that a proposition represents the truth must be justified. 

Justification may be thought of as an argument for why we hold certain beliefs or why we think those 
beliefs are reasonable and true. These justifications may be under the law or before God. However, in the 
context of assurance, justifying beliefs must be based on knowledge, or in other words, be based on epistemic 
justification [54]. (An AI system needs to conform to laws and regulations, the point is that the justification 
must be based on knowledge). 

Assurance seeks epistemic justification to establish if a proposition can be turned into a belief, that is, 
belief through warranted propositions. 

Belief revision is the process of changing belief based on new data [22]. It is important to emphasise 
that good reasoning is no guarantee of truth. Seeking the truth and believing to have found it using sound 
methods and reasoning is no guarantee to actually have found it. 

Justifying a proposition may, in principle, entail an infinite chain of justifications (infinitism): The 
justification of the justification of the justification. ..This is, of course, unacceptable. The question, then, is 
when to stop this chain of justifications. 

One strategy is to continue until the supporting justifications become self-evident, that is, propositions 
that do not need further justification (foundationalism). This kind of justification results in a hierarchy of 
propositions, and the “bottom” of this hierarchy consists of fundamental propositions, that is, self-justified 
propositions. 

Alternatively, we may ensure that the propositions support each other, that is, the propositions are 
coherent (coherentism). With this strategy, there are no fundamental propositions. Critiques claim that this 
strategy can lead to circular argumentation [54]. 

A reasonable approach is to combine the two strategies, that is, ensuring coherence within the set of 
propositions and justification and stopping the chain of justification when reaching a self-justified proposition. 

In practice, one may not reach a self-evident fundamental level for several reasons. One reason may be 
that there is a dispute about whether such a level is actually reached®; another reason may be that continuing 
the chain of justification requires disproportionate resources. Therefore, there may be residual uncertainty 
as to whether a proposition represents the truth. 

Other sources of uncertainty are that there may exist evidence that weakens the proposition, or there may 
be a lack of available evidence. Moreover, other obstacles may hinder the generation of additional evidence, 
such as technical limitations, ethical concerns, lack of statistical data, or other practical causes. 

There is no universal uncertainty threshold for when an agent will accept a proposition and when he 
rejects it. Moreover, given a justification of a proposition, there is no universal law governing the level of 
uncertainty an agent will feel about its truthfulness. 

Belief revision depends not only on the properties of the justification of the proposition but also on the 
agent’s epistemic state, that is, the agent’s required rationality to turn a proposition into a belief, prior belief 
and any other properties important for the agent to represent facts about the world. 

The uncertainty threshold for an agent’s belief revision also depends on aspects such as the risk (perceived 
and/or actual) of accepting or rejecting a proposition (including being indifferent). Moreover, an agent’s level 
of uncertainty, given a justification of a proposition, depends not only on the strength of the justification, 
but also on aspects such as the degree of being susceptible to cognitive biases® [37] and rhetoric. Obviously, 
we should strive to minimise aspects of belief revision that are unrelated to the properties of the justification. 


8Showing compliance towards an international industry standard is often regarded as such a self-justified belief, that is, 
providing evidence that a system complies with such a standard is often regarded as adequate believing a proposition that e.g. a 
system is reliable, fair, safe and secure. An international standard should reflect good industry practice. However, artificial 
intelligence is such a novel technology that even if there exist a relevant international standard, it may not be regarded as 
self-justified because the standard itself does not necessary reflect any industry practice (because there do not exist any such 
practice), or at at least the practice may be inadequate. This means that it might be necessary to continue the justification 
chain further. 

°Perhaps the most commonly known is the so-called confirmation bias, that is, our tendency to seek evidence that confirms 
our prior beliefs. However, most other cognitive biases are at work, like the illusion of understanding and what you is all there 
is (WYSIATI), that is, our tendency of believing that we understand complex topics by filling in the information gaps and the 
epistemic gaps so that the story becomes compelling and coherent, which leads to confidence in the truthfulness of the story 
(or proposition in this case). 


14 


An agent’s prior beliefs cannot, and should not, be controlled and cannot be totally known. Nevertheless, 
prior belief is central to belief revision. Data-oriented Belief Revision (DBR) [44] (simplified illustration in 
Figure 10) is a model of belief revision that can illustrate the role of prior belief in belief revision. 


External data 
——> > —> 


Figure 9: Simplified epistemic processing in DBR, [44] 


After new data is available about a proposition (External data), the data is assessed to determine their 
relevance and strength, possibly forming a new or updated belief set, termed belief selection in Figure 10. 
This process regulates the interaction between data and beliefs, what to believe in, and with what strength. 

As belief revision is tightly connected to the agent’s prior beliefs and possible degrees of cognitive biases, 
we cannot assess the epistemic strength of the justification by appealing to the agent’s prior beliefs, or what 
seems to be “very reasonable” and the like. What seems reasonable is an internal feeling in each agent and 
is largely based on his current epistemic state. 

Instead, the agent needs to be nudged towards sound rationality of assessing uncertainty using a more 
comprehensive framework of thinking about the level of uncertainty (epistemic strength of the justifica- 
tion), without being forced into an epistemic strait jacket of predefined categories of epistemic levels. This 
framework was described in ’Objectivity - a metric of strength of knowledge’. 

We want to decrease uncertainty as the risk of accepting a false proposition increases. The opposite 
may not be so obvious, that we also want to decrease uncertainty when risk increases by rejecting a true 
proposition. Accepting a false proposition, on the one hand, or rejecting a true proposition, on the other, 
represents assurance risk (see ’Two kinds of risk - system risk and assurance risk’). 

Decreasing uncertainty to the point of accepting a proposition, or in other words, revising one’s belief, 
can be achieved by both strengthening the justification that the proposition is true, and/or by increasing 
effort in seeking justification that the proposition is false without finding such justification. Sometimes, the 
only way to justify a proposition p is to find a strong justification that 7p is not the case!®. 

A way to accommodate for proper assessment that knowledge is built on epistemic justification is through 
argumentation. While belief revision describes how we should update our beliefs, argumentation is a way to 
make belief revision occur. “The two concepts are two sides of the same epistemic coin” [43, 44]. 


11 Assurance case - a systematic way to represent knowledge 


The assurance case is a way to represent knowledge (see ’The ontology of assurance’). At its core, an 
assurance case consists of a hierarchy of claims and arguments, including evidence that substantiates those 
claims. The claims are equivalent to the before-mentioned propositions, and the argument is equivalent 


10A famous statement from software testing illustrates this: Software testing cannot prove the absence of bugs, only their 
presence. A proposition that some software code is bug free (p) cannot be proven through testing alone. Software testing tries 
to find bugs, and when no bugs are found, one may start to believe p because one haven’t found evidence that 7p is the case. 
(Of course, as most testing is non-exhaustive, not finding bugs does not mean the absence of bugs.) 


15 


to the before-mentioned justifications. Moreover, a claim may also be understood as a reformulation of a 
requirement. A question may be how to lay out, organise, and assess the arguments, which is the topic of 
this section. 

One of the most recognised and influential layouts of arguments is the schema described by Stephen 
Toulmin in his 1958 book “The Uses of Arguments” [52]. Toulmin’s motivation was to create a richer format 
that better reflected how people argued in reality instead of the more formal and traditional format consisting 
of premise and conclusion. 

The argument layout consists of six elements [50]: Claim (or Conclusion) (C), Data (D) (or Datum, 
Toulmin uses both terms), Warrant (W), Qualifier (Q), Backing (B), Rebuttal (R) (Figure 11). 


Ds $30; 0; C 


Since 
Unless 


R 


On the behalf of 
B 


Figure 10: General layout of an argument [50, p. 97| 


In the simplest form, (D) may be some evidence that proves that (C) is the case. The transition between 
(D) and (C) may not be trivial, so a warrant needs to act as an inference licence between (D) and (C); that 
is, (W) acts as a bridge between (D) and (C). (W) may also be challenged, so a backing (B) may be needed 
to support (W), that is, why (W) holds. (Q) indicates the strength of the step (i.e. strength of the “bridge”) 
from (D) to (C). (R) indicates circumstances in which (W) may not hold. 

Although the elements of an argument described by Toulmin are necessary aspects of an epistemic 
justification substantiating a proposition or assertion, the schema, in its simplest form, is insufficient for 
assurance of complex AI systems. The schema needs to be expanded. 

Firstly, in the assurance of AI systems, there are many claims. System claims represent statements about 
the system properties and its use. These requirements address many systems properties, including ethical 
aspects. Moreover, the claims must be refined at several levels of abstraction (LoAs). The LoAs link back to 
the LoAs connected to the systems approach (see ’Levelism’) and foundationalism (see Epistemology and 
justification’. 

Secondly, although one of Toulmin’s key motivations was to enable “practical assessment of arguments” 
[50], he did not discuss aspects of argument assessment in detail. Clearly, when, e.g., a (top) claim is refined 
into two or more subclaims with accompanying justification, assessing the strength of each argument needs 
to be aggregated in some way to reflect the confidence in the top claim. Moreover, each element in the 
argumentation schema should be assessed, leading to a network of assessments on different elements of an 
argument on different LoAs. 

Several expanded argument schemas based on Toulmin have been developed, such as Goal Structuring 
notation (GSN) [8] and Trust-IT [27]. 

An assurance case organises these arguments systematically and structured and represents the knowledge 
generated in the assurance (see Figure 2). Different ways are possible based on the various argument schemas, 
such as [8] or [10]; both are compatible with [7]. A metamodel of an assurance case may also be found in 
[12]. 


16 


11.1 Assessing arguments 


Developing and documenting arguments in a systematic and transparent manner is vital to be able to properly 
assess the knowledge represented by the arguments. As already indicated, the intention of an assurance case 
is precisely this and is particular important when introducing novel technology into society, such as AI. 

An assessment may be organised as part of review process such as an independent audit. However, as 
the assurance case should be developed and maintained as part of the entire system lifecycle, so should 
the assessment. The assessment may be performed by different parties, such as the system developer or an 
independent. party. 

Independently of who performs the assessment, its value depends on the quality of the assessment. As a 
consequence, relevant competence of the assessment agent (e.g. the assessor) is a prerequisite. Introducing 
AI into society is associated with a large degree of uncertainty, therefore, a cross disciplinary assessment 
process should be considered. 

Claims (and arguments including evidence) associated with technical system properties may be more 
quantifiable (left in Figure 3), and claims associated with social impact on society are less quantifiable (right 
in Figure 3). While in the first case, the assessment may be based on the achievement of some numerical 
value, in the latter case, the assessment necessarily will be more judgemental. Indeed, the difference between 
the two can be tied back to Toulmin’s intention when developing a richer and more flexible argumentation 
schema than argumentation solely based on formal logic, mathematics and frequentist statistics. 

The challenge and the strength of judgemental assessments is that they may be influenced by personal 
preferences and domain-specific traditions that may or may not be relevant in the case at hand. Separating 
the irrelevant from the relevant and valuable may be difficult. Elements (in addition to the above-mentioned 
aspects) that will increase the chance for a proper assessment are 1) Awareness of argument properties, 2) 
adhering to the principles of objectivity, 3) transparency in the assessment and 4) ability to combine different 
kinds of evidence and aggregate uncertainty and degree of belief. 

More information about 1) and 2) can be found in [31], and ’Objectivity - a metric of strength of 
knowledge’. The assurance case promotes transparency, item 3); however, transparency also depends on 
aspects such as the understandability of the assessment concepts and the communication of the assessment 
results. 

A common and well-known method used to make judgemental assessments reflecting an agent’s belief 
in addition to the ability to aggregate the uncertainty about those beliefs is Bayesian Networks (BN). The 
details of BN are well-known and documented, so it will not be further discussed here. 

One challenge with BN is that it forces the assessor to assign (subjective) probabilities to his belief; How- 
ever, such an assignment may not capture the essence of uncertainty and degree of belief in all circumstances. 

Although Machine Learning (ML) has arguably had the edge in recent years, many solutions tend to 
depend on a combination of ML and symbolic AI (i.e. logic). Symbolic AT is more like traditional deterministic 
Software-based expert systems. 

As ML is purely statistical, a frequentist approach to probabilities can be used when expressing uncer- 
tainty about its outcome. However, related to the part of an AI solution that is based on deterministic 
software solutions, probability estimates may be inappropriate [28]. 

Moreover, the AI solution constitutes just one part of the entire Al-enabled system. Its behaviour depends 
equally on other parts, such as humans and mechanical objects (see "The systems approach to handle system 
complexity and emergence’). This means that a purely probabilistic approach to express uncertainty and 
degree of belief may not be adequate, even if the Al-enabled system is solely based on ML. 

An alternative to BN is belief functions [47]. Belief functions provide one way to use mathematical 
probability in subjective judgement and is a generalisation of BN [48]. Combining the theory of belief 
functions and Dempster’s rule for combining those functions (a.k.a. “Dempster’s rule by Shafer!!” [19]) 
forms the so-called “Dempster-Shafer theory” (DST). DST does not depend on expressing uncertainty solely 
through subjective probability estimates as BN does. 

According to Dempster’s rule by Shafer, multiple arguments (including evidence) supporting a claim 
will strengthen the belief in that claim, while counteracting arguments weaken the belief. Organising the 
assurance case through a hierarchy of claims and subclaims and utilising, e.g. DST on the (Toulmin) argument 


11Other rules for combining belief function also exists [46] 


17 


schema, will aggregate the uncertainty and degree of belief from the bottom to the top of the hierarchy 
(i.e. answering item 4 above). 

Figure 9 visualise the discussion above. The top illustrates that stakeholders hold objectives (top right) 
that they would like fulfilled with adequate confidence (top left). Stakeholder objectives or interests are 
elicited as (top-level) system requirements (right). These requirements are then refined and concretised 
into lower-level requirements applicable to the system at other LoAs (epistemic or ontological), e.g. an 
AI subsystem. An argumentation is developed and assessed, resulting in uncertainty about whether the 
requirement is adequately fulfilled or not, due to imperfect knowledge (epistemic uncertainty) or stochastic 
variability (aleatory uncertainty). 

Both kinds of uncertainty (aleatory and epistemic) about the degree of fulfilment of these lower-level 
requirements need to be propagated back up to the top-level (right) to form the stakeholder’s level of 
confidence about their objectives and interests. 


stakeholders 


HOLDS 


ELICITED IN DETERMINES 
objectives confidence 
ox? Ss Uncertainty 
FX 0 
x 
sor oo ve 
se a aleatory — Probabilistic 
oe xe models 
Be cg Nie 
oo ropagate 
uncertainty 
Validation Argum ntation 
of ful Iment epistemic — Belief 
functions 
Argumentation 


strategy 


Figure 11: Propagating different kinds of uncertainty 


Probabilistic models for uncertainty aggregation can be used when based on robust statistics about the 
probability distribution of the parameter (i.e. system property) of interest. When the parameter of interest re- 
flects a probability distribution, but no robust statistic exist, BN can be used. When the parameter /property 
of interest is not related to a probability distribution Belief functions can be used. 


11.1.1 The assessment triangle 


The assurance case only partly answers item 3 above (transparency in the assessment). A way to disclose 
more details about the assessment is to use Visual Assessment of Arguments (VAA) [18]. VAA is based 
on a linguistic appraisal scale formalised using Dempster-Shafer’s belief and plausibility functions and then 
mapping this appraisal onto the opinion triangle proposed by Audun Jdsang [36]. The result is a visualisation 
in the form of a triangle (Figure 12), capturing both the degree of belief and the uncertainty the assessor 
holds. 

A tool for organising the assurance case, such as NOR-STA [10], has implemented this triangle where 
the assessor places a marker (indicated by the white circle) inside the triangle that represents his (expert) 
opinion related to the elements of the argument (evidence, warrant, assumptions) related to a claim. These 
opinions are then aggregated both for each claim and further aggregated upward in the hierarchy of claims. 

The assessment triangle represents an assessor’s epistemic status concerning the truthfulness of a claim, 
and not primarily whether, e.g. the piece of evidence is trustworthy with adequate quality. A piece of 
trustworthy evidence supporting a claim will then be tagged by a marker in the lower right corner; a piece of 
trustworthy evidence defeating a claim will be represented by a marker in the lower left corner; no evidence, or 
an untrustworthy piece of evidence, will be represented by a marker at the top corner (because untrustworthy 
evidence does not alter the assessor’s belief about the claim). 


18 


uncertainty 


disbelie elief 


Figure 12: The assessment triangle [18, p. 32] 


12 Modularity in assurance 


An AI system may consist of several subsystems from different subsuppliers. Therefore, it may become 
challenging to assure every subsystem within the same assurance process as the larger AI system. One 
such challenge is the need for tight interaction between the organisations of the different suppliers to share 
knowledge, which is important for establishing the level of confidence needed for the larger AI system, that 
is, uncertainty about a subsystem needs to be propagated. Another reason may be that a subsystem may be 
a standardised off-the-shelf component, and the supplier does not take part in the assurance process of the 
larger AI system. Again, knowledge important for the propagation of uncertainty needs to be made available 
to the assurance effort of the larger AI system. 

The solution is to split the assurance into modules. Each assurance module represents the needed 
knowledge about a subsystem to be used by the assurance effort of the larger system of which a subsystem 
is a part. 

This can be compared to a type-approval of a component. The type-approved component complies with 
a set. of pre-defined requirements that are known by the potential user of the component. Modular assurance 
is more generic because there may be no predefined requirements to comply with. Nevertheless, the content 
of the assurance module must convey the same kind of confidence conveyed by a type-approval certificate. 

However, it is important to note that there may be no commonly acknowledged assurance effort, such as 
certain mandatory tests that must be conducted to get a component type-approved. These test efforts often 
relate to the fact that the type-approved component is supposed to be used in certain environments and in 
specific kinds of operations. Assurance modules are not restricted to these predefined prescriptive limitations. 
However, the subsystem’s operational conditions must be captured and conveyed by the assurance module. 

Several assurance modules are connected through assurance contracts. These contracts represent the 
assurance results of a subsystem, and are the concrete entities that enter the assurance effort of the larger 
AI system. 

Connecting assurance modules through contracts makes the assurance effort of the larger AI system more 
efficient, that is, when assuring the larger system, one does not need to repeat the assurance of its subsystems 
but can still propagate the uncertainty that originates from this subsystem. Utilizing modular assurance 
simplifies assurance so fewer actors need to be included in the assurance effort. Moreover, it may enable 
subsystems assured through alternative frameworks to be connected, given that the assurance contracts are 
compatible. 

Iterations enable the process to start without knowing everything about the target system and its use. 
This is an important principle regarding novel technology. The starting point of the process may only be 
some more or less loose thoughts about the intention of a system, its composition, and the users. 

Suppose a process requires that everything is known, and it’s just a matter of documenting this knowledge; 
even starting this process would be difficult. Moreover, adhering to such a process would require a preliminary 
process to be conducted just to get to the “starting position”. It is better to encapsulate this preliminary 
process into the main assurance process to make it visible and controllable. 

It is important to note that the iterations should include items 1) and 2) above. The identified stakeholder 
and their interests forming the system requirements and claims may not be static. The interpretation of the 


19 


traditional V-model of software development has been closely connected to the waterfall model. According 
to these models, the thinking is that the system requirements and behaviour are defined at the beginning 
and then kept unchanged. Then, a pre-defined verification effort is conducted before the system is regarded 
as fit for purpose. There were no or few feedback loops to assess the adequacy of the system requirements 
and the verification effort. 

The iteration of the assurance process can be considered analogue to a more modern software development 
framework such as Scrum [2], that is, following an agile approach [13] to assurance. 

The properties of the target system are established during the system development. As the knowledge 
generated by the assurance effort should reflect these properties, the assurance process should be run in 
parallel with all system lifecycle phases. 

Many factors affect requirements about the system properties and development, such as potential market 
impact, development cost, risk, and more. Assurance should also be among these factors. A properly 
designed and robust assurance effort ensures adequate knowledge crucial to any decision-making relevant 
to developing and using an AI system. These decision-making processes are conducted in all phases of the 
system lifecycle. Decisions about needed design change or its use are cheaper to do early in the development 
phases, perhaps as early as in the concept phase, rather than having to perform costly design changes late. 

Continued learning (i.e. re-trained) AI systems will evolve with time and usage. Therefore, the assurance 
effort should continue into the operation to ensure that the knowledge generated in the assurance effort 
reflects the system’s current state, including its use. The assurance effort runs parallel with the system 
lifecycle, resulting in a continuous assurance effort. 


13. Conclusion 


An assurance effort consists of much more than testing the AI component. Assurance is about generating 
knowledge about the target system properties important to stakeholders, that is, creating grounds for justified 
confidence. 

The paper discusses the motivation for the content of the new DNV Recommended Practice (RP) on the 
Assurance of Al-enabled systems [11]. Some topics may appear novel, while some well-known and established 
concepts have gotten novel content and understanding. 

This might seem unfamiliar and perhaps even unnecessary to some practitioners. “Why can’t we just do 
what we’ve always done?”, some may ask. What people mean in technical terms is why are methods based 
on a reductionist philosophy ineffectual? The short answer is: because of system complexity and emergent 
behaviour. 

We need to analyse the interactions and interdependencies to understand the behaviour of such systems. 
The discussion among practitioners about a reductionist versus a systems approach is equivalent to the old 
joke about the man who lost his car keys in a dark part of the road; he then walks over to a lamppost nearby 
and starts looking for them. He searches under the lamppost, not because he hopes to find the keys, but 
because this is where he can see. 

An AI system consists of more than the AI component. Within a system working in society, the AI 
component is often encapsulated by non-AI constituents, that is, it only interacts with non-AI internal 
system constituents. The AI component will often count for only part of the system behaviour; other 
parts can be traditional software, human operators, and mechanical components. Moreover, the system 
behaviour is also largely affected by the environment in which it is operating. This is why the RP takes 
a systems approach to every aspect of assurance. The systems approach captures all aspects important to 
understanding the behaviour of a system through the CESM-metamodel and levels of abstraction. Referring 
to the joke above, the systems approach is a new (working) lamppost enabling looking for the car keys where 
they actually were lost. 

To understand to whom and what impact such systems have on society, the assurance effort needs to 
identify the stakeholders that may affect and/or be affected by the system. Stakeholders are located both 
within the system, and in the environment in which the system operates. 

Understanding the kind, and level of, (unwanted) impact, that is, what can be seen as unacceptable 
consequences, needs to follow some commonly recognised ethical principles. The RP lists five such principles 
that can be refined into adequate lower level system property requirements such as safety, robustness, security, 


20 


fairness and promoting human autonomy. 

Risk describes the degree and uncertainty about deviation from these system requirements. The tradi- 
tional understanding of risk, that is, risk is some combination of consequence and likelihood, is inadequate 
to capture the essence of risk in complex systems in general, and in particularly when complex systems are 
based on novel technology such as AI. Therefore, the likelihood part of the traditional risk understanding is 
expanded to also capture epistemic uncertainty, that is. 

Epistemic uncertainty decreases with increased justification substantiating a proposition. Although as- 
surance seeks the truthfulness of a proposition, the main point is the justification for why the proposition 
represents the truth. The justification creates ground for justified confidence; correctly guessing the truth 
does not count as knowledge. 

The way assurance justifies propositions is through argumentation. An argument contains at least some 
evidence and a warrant for why a piece of evidence supports the claim. Assuring an AI system that contains 
multiple subsystems concerning multiple system properties at different levels of abstraction, leads to a need 
for a systematic organisation of the argumentation, that is, an assurance case. An assurance case not only 
organises the argumentation, but also enables transparency into it, opening for effective 3-party assessment 
that eventually increases confidence in a system. 

Testing generates evidence, and evidence is an essential part of an argument. Therefore, testing is 
essential in any assurance effort; however, it is insufficient to create grounds for justified confidence. As seen 
throughout this paper, assurance is much more than testing. 


21 


References 


[1 


[2| 
[3 


— 


—= 


[4 


= 


[5 


= 


[6 


= 


[7 


— 


[8 


— 


[9 


= 


[10] 


[11] 
[12] 
[13] 
[14] 


[15] 


[16] 


[17] 


[18] 


[19] 


[20] 
[21] 
[22] 


[23] 


ISO/IEC JTC 1/SC 42 - Artificial intelligence. 
What is Scrum? | Scrum.org. https: //www.scrum.org/resources /what-scrum-module. 


ISO/IEC /TEEE International Standard - Systems and software engineering - Software life cycle pro- 
cesses, November 2017. 


ISO/TEC/TEEE International Standard - Risk management — Guidelines, February 2018. 
Ethics Guidelines for Trustworthy AI. Technical report, EU, April 2019. 


ISO/TEC/TEEE International Standard - Systems and software engineering-Systems and software as- 
surance —Part 1: Concepts and vocabulary, March 2019. 


ISO/TEC/TEEE International Standard - Systems and software engineering-Systems and software as- 
surance —Part 2: Assurance case, March 2019. 


Goal Structuring Notation Community Standard, May 2021. 


Proposal for a Regulation laying down harmonised rules on artificial intelligence | Shaping Eu- 
rope’s digital future. https: //digital-strategy.ec.europa.eu /en /library /proposal-regulation-laying-down- 
harmonised-rules-artificial-intelligence, April 2021. 


Argevide - System assurance management tools - Assurance cases. https://www.argevide.com/home/, 
October 2023. 


DNV-RP-0671 Assurance of Al-enabled systems, September 2023. 
Structured Assurance Case Metamodel (SACM), October 2023. 
Atlassian. What is Agile? https://www.atlassian.com /agile. 


Mark A. Bedau. Is Weak Emergence Just in the Mind? Minds and Machines, 18(4):443-459, December 
2008. 


Justin B. Biddle and Rebecca Kukla. The geography of epistemic risk. In Exploring Inductive Risk: 
Case Studies of Values in Science, pages 215-238. Oxford University Press, August 2017. 


Mario Bunge. Emergence and Convergence: Qualitative Novelty and the Unity of Knowledge. Toronto 
Studies in Philosophy. University of Toronto Press, Toronto ; Buffalo, 2003. 


Charles W. Bytheway. FAST Creativity & Innovation: Rapidly Improving Processes, Product Develop- 
ment and Solving Complex Problems. J. Ross Pub, Fort Lauderdale, Fla, 2007. 


Lukasz Cyra and Janusz Gorski. Support for argument structures review and assessment. Reliability 
Engineering & System Safety, 96(1):26-37, January 2011. 


Arthur Dempster. Construction and Local Computation Aspects of Belief Functions. Influence Dia- 
grams, Belief Nets, and Decision Analysis, 47(3):121-141, 1990. 


Heather Douglas. Inductive Risk and Values in Science. Philosophy of Science, 67, December 2000. 
Heather E. Douglas. Science, Policy, and the Value-Free Ideal. University of Pittsburgh Press, 2009. 


Marcelo A. Falappa, Gabriele Kern-Isberner, and Guillermo R. Simari. Belief Revision and Argumenta- 
tion Theory. In Iyad Rahwan and Guillermo R. Simari, editors, Argumentation in Artificial Intelligence. 
Springer Dordrecht Heidelberg, Boston, MA, 1 edition, July 2009. 


Luciano Floridi. The Method of Levels of Abstraction. Minds and Machines, 18(3):303-329, September 
2008. 


22 


[24] 


[25] 


[26] 


[27] 


[28] 


[29] 


[30] 


[31] 


[32] 


[33] 


[34] 


[35] 


[36] 


[37] 
[38] 


[39] 
[40] 
[41] 


[42 


[43] 


Luciano Floridi. The Ethics of Artificial Intelligence: Principles, Challenges, and Opportunities. Oxford 
University Press, Incorporated, Oxford, UNITED KINGDOM, 2023. 


Luciano Floridi and Josh Cowls. A Unified Framework of Five Principles for AI in Society. In Luciano 
Floridi, editor, Ethics, Governance, and Policies in Artificial Intelligence, Philosophical Studies Series, 
pages 5-17. Springer International Publishing, Cham, 2021. 


Craig R. Fox and Giilden Ulkiimen. Distinguishing Two Dimensions of Uncertainty. In Perspectives on 
Thinking, Judging, and Decision Making: A Tribute to Karl Halvor Teigen, page Chapter 1. Univer- 
sitetsforlaget, 2011. 


Janusz Gorski, Lukasz Cyra, Aleksander Jarzebowicz, and Jakub Miler. Argument Strategies and 
Patterns of the Trust-IT Framework. Polish Journal of Environmental Studies, 17, January 2008. 


Odd Haugen and Bjornar Vik. Quantitative software reliability methods Can HIL testing data be used 
to calculate probability of failure for control system software? Technical report, DNV, June 2019. 


Odd Ivar Haugen. Safety assurance of complex systems Part 1: Complexity. Whitepaper, DNV, Hevik, 
Norway, 2019. 


Odd Ivar Haugen. Safety assurance of complex systems Part 2: Assurance and analysis. Whitepaper, 
DNV, Hevik, Norway, 2019. 


Odd Ivar Haugen. Safety assurance of complex systems Part 3: Verification and evidence. Whitepaper, 
DNV, Hevik, Norway, 2019. 


Odd Ivar Haugen. Developing a safety argument. In Demonstrating Safety of Software-Dependent 
Systems : With Examples from Subsea Electric Technology, pages 55-82. DNV AS, Hovik, Norway, 
March 2022. 


Odd Ivar Haugen. The Systems Approach. In Tore Myhrvold and Meine van der Meulen, editors, 
Demonstrating Safety of Software-Dependent Systems; With Examples from Subsea Electric Technology, 
pages 145-163. DNV AS, 2022. 


John H. Holland. Complexity: A Very Short Introduction. Number 392 in Very Short Introductions. 
Oxford University Press, Oxford, United Kingdom, first edition edition, 2014. 


Erik Hollnagel. FRAM: The Functional Resonance Analysis Method, Modelling Complex Socio-Technical 
Systems. Ashgate Publishing Limited, 2012. 


Audun Jésang, Simon Pope, and Milan Daniel. Conditional Deduction Under Uncertainty. In Symbolic 
and Quantitative Approaches to Reasoning with Uncertainty, page 835, Barcelona, Spain, July 2005. 


Daniel Kahneman. Thinking, Fast and Slow. Farrar, Straus and Giroux, New York, Ist ed edition, 2011. 


Nancy G. Leveson. Engineering a Safer World: Systems Thinking Applied to Safety. MIT Press, 
Cambridge, Massachusetts, January 2012. 


Liz McDaniel. What Is Bioethics? https://bioethics.msu.edu/what-is-bioethics. 
Melanie Mitchell. Complexity: A Guided Tour. Oxford University Press, New York, NY, 2011. 


Jennifer Nagel. Knowledge: A Very Short Introduction. Number 400 in Very Short Introductions. 
Oxford University Press, Oxford, first edition edition, 2014. 


OECD. OECD Framework for the Classification of AI systems. Technical report, OECD, Paris, February 
2022. 


Fabio Paglieri and Cristiano Castelfranchi. Argumentation and Data-oriented Belief Revision: On the 
Two-Sided Nature of Epistemic Change. In CMNA IV: 4th Workshop on Computational, January 2004. 


23 


[44] 


[45] 
[46] 
[47] 
[48] 


[49] 


[50] 
[51] 


[52] 


[53] 


[54] 
[55] 
[56] 


Fabio Paglieri and Cristiano Castelfranchi. The Toulmin Test: Framing Argumentation within Belief 
Revision Theories. In David Hitchcock and Bart Verheij, editors, Arguing on the Toulmin Model: New 
Essays in Argument Analysis and Evaluation, pages 359-377. Springer Netherlands, Dordrecht, 2006. 


Bertram Russel. Human Knowledge—Its Scope and Limits. Allen & Unwin, 1948. 
Kari Sentz and Scott Ferson. Combination of Evidence in Dempster-Shafer Theory. January 2002. 
Glenn Shafer. A Mathematical Theory of Evidence. Princeton University Press, 1976. 


Glenn Shafer. Perspectives on the theory and practice of belief functions. International Journal of 
Approximate Reasoning, 4(5):323-362, September 1990. 


Deborah Stone. Counting: How We Use Numbers to Decide What Matters. Liveright, first edition, 
October 2020. 


Stephen E. Toulmin. The Uses of Argument. Cambridge University Press, 2 edition, 2002. 


A. van Lamsweerde. Requirements Engineering: From System Goals to UML Models to Software Spec- 
ifications. John Wiley, Chichester, England ; Hoboken, NJ, 2009. 


Bart Verheij. The Toulmin Argument Model in Artificial Intelligence — Or: How semi-formal, defeasible 
argumentation schemes creep into logic. In Argumentation in Artificial Intelligence, pages 219-238. 
Springer New York, NY, 1 edition, January 2009. 


M. Mitchell Waldrop. Complexity: The Emerging Science at the Edge of Order and Chaos. A Touchstone 
Book. Touchstone, New York, NY, 1. touchstone ed edition, 1993. 


Jamie Carlin Watson. Epistemic justificaiton. Internet Encyclopedia of Philosophy. 
Gerald M. Weinberg. An Introduction to General Systems Thinking. Dorset House, 2001. 


Steven Wolfram. Undecidability and intractability in Theoretial Physics. In Emergence, pages 387-393. 
The MIT Press, Cambridge, Massachusetts, 2008. 


24 


